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Abstract Emerging interest of trading companies and hedge funds in mining 
social web has created new avenues for intelligent systems that make use of 
public opinion in driving investment decisions. It is well accepted that at high 
frequency trading, investors are tracking memes rising up in microblogging 
forums to count for the public behavior as an important feature while making 
short term investment decisions. We investigate the complex relationship be- 
tween tweet board literature (like bullishness, volume, agreement etc) with the 
financial market instruments (like volatility, trading volume and stock prices) . 
We have analyzed Twitter sentiments for more than 4 million tweets between 
June 2010 and July 2011 for DJIA, NASDAQ-100 and 11 other big cap tech- 
nological stocks. Our results show high correlation (upto 0.88 for returns) 
between stock prices and twitter sentiments. Further, using Granger's Causal- 
ity Analysis, we have validated that the movement of stock prices and indices 
are greatly affected in the short term by Twitter discussions. Finally, we have 
implemented Expert Model Mining System (EMMS) to demonstrate that our 
forecasted returns give a high value of R-square (0.952) with low Maximum 
Absolute Percentage Error (MaxAPE) of 1.76% for Dow Jones Industrial Av- 
erage (DJIA). We introduce a novel way to make use of market monitoring 
elements derived from public mood to retain a portfolio within limited risk 
state (highly improved hedging bets) during typical market conditions. 

Keywords Stock market • sentiment analysis • Twitter • microblogging • 
social network analysis 
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1 INTRODUCTION 

Financial analysis and computational finance have been an active area of re- 
search for many decades [18]. Over the years, several new tools and methodolo- 
gies have been developed that aim to predict the direction as well as range of fi- 
nancial market instruments as accurately as possible |17j. Before the emergence 
of internet, information regarding company's stock price, direction and general 
sentiments took a long time to disseminate among people. Also, the companies 
and markets took a long time (weeks or months) to calm market rumors, news 
or false information (memes in Twitter context). Web 3.0 is characterized with 
fast pace information dissemination as well as retrieval [7j. Spreading good or 
bad information regarding a particular company, product, person etc. can be 
done at the click of a mouse |5] , Q] or even using micro-blogging services such 
as Twitter |26J. Recently scholars have made use of twitter feeds in predict- 
ing box office revenues [2], political game wagons [30], rate of flu spread [29] 
and disaster news spread |12) . For short term trading decisions, short term 
sentiments play a very important role in short term performance of financial 
market instruments such as indexes, stocks and bonds [2"0] . 

Early works on stock market prediction can be summarized to answer the 
question - Can stock prices be really predicted? There are two theories - (1) 
random walk theory (2) and efficient market hypothesis (EMH) [2"5] . According 
to EMH stock index largely reflect the already existing news in the investor 
community rather than present and past prices. On the other hand, random 
walk theory argues that the prediction can never be accurate since the time 
instance of news is unpredictable. A research conducted by Qian et.al. com- 
pared and summarized several theories that challenge the basics of EMH as 
well as the random walk model completely [57]. Based on these theories, it has 
been proven that some level of prediction is possible based on various eco- 
nomic and commercial indicators. The widely accepted semi-strong version of 
the EMH claims that prices aggregate all publicly available information and 
instantly reflect new public version [21]. It is well accepted that news drive 
macro- economic movement in the markets, while researches suggests that so- 
cial media buzz is highly influential at micro- economic level, specially in the 
big indices like DJIA [5], [32], [T5] and [35]. Through earlier researches it has 
been validated that market is completely driven by sentiments and bullishness 
of the investor's decisions [27]. Thus a comprehensive model that could incor- 
porate these sentiments as a parameter is bound to give superior prediction 
at micro- economic level. 

Earlier work done by Bollen et. al. shows how collective mood on Twitter 
(aggregate of all positive and negative tweets) is reflected in the DJIA index 
movements [5] and [22j . In this work we have applied simplistic message board 
approach by defining bullishness and agreement terminologies derived from 
positive and negative vector ends of public sentiment w.r.t. each market secu- 
rity or index terms (such as returns, trading volume and volatility) . Proposed 
method is not only scalable but also gives more accurate measure of large 
scale investor sentiment that can be potentially used for short term hedging 
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strategies as discussed ahead in section [6j This gives clear distinctive way for 
modeling sentiments for service based companies such as Google in contrast 
to product based companies such as Ebay, Amazon and Netflix. We validate 
that Twitter feed for any company reflects the public mood dynamics compris- 
ing of breaking news and discussions, which is causative in nature. Therefore 
it adversely affects any investment related decisions which are not limited to 
stock discussions or profile of mood states of entire Twitter feed. 

In section [2j we discuss the motivation of this work and related work in the 
area of stock market prediction in section [3] In section [4] we explain what and 
how of the techniques used in mining data and explain the terminologies used 
in market and tweet board literature. In section [5] we have given prediction 
methods used in this model with the forecasting results. In section[6]we discuss 
how Twitter based model can be used for improving hedging decisions in a 
diversified portfolio by any trader. Finally in section [7] we discuss the results 
and in section [8] we present the future prospects and conclude the work. 

2 MOTIVATION 

"Communities of active investors and day traders who are sharing opinions 
and in some case sophisticated research about stocks, bonds and other finan- 
cial instruments will actually have the power to move share prices ...making 
Twitter-based input as important as any other data to the stock" 

-TIME (2009) [H] 

High Frequency Trading (HFT) comprises of very high percentage of trad- 
ing volumes in the present US market. Traders make an investment position 
that is held only for very brief periods of time - even just seconds - and rapidly 
trades into and out of those positions, sometimes thousands or tens of thou- 
sands of times a day. Therefore the value of an investment is as good as last 
known index price. Investors will do anything that will give them an advan- 
tage in placing market bets. A large percentage of high frequency traders in 
US markets, have trained AI bots to capture buzzing trends in the social me- 
dia feeds without learning dynamics of the sentiment and accurate context of 
the deeper information being diffused in the social networks. For example, in 
February 2011 during Oscars when Anne Hathaway was trending, stock prices 
of Berkshire Hathaway rose by 2.94% [3] Figure [l] highlight the incidents when 
the stock price of Berkshire Hathaway jumped coinciding with an increase of 
buzz on social networks/ micro-blogging websites regarding Anne Hathaway 
(for example during movie releases). 

The events are marked as red points in the Figure [I] , event specific news 
on the points- 

A: Oct. 3, 2008 - Rachel Getting Married opens: BRK.A up 0.44% 

B: Jan. 5, 2009 - Bride Wars opens: BRK.A up 2.61% 

C: Feb. 8, 2010 - Valentine's Day opens: BRK.A up 1.01% 

D: March 5, 2010 - Alice in Wonderland opens: BRK.A up 0.74% 
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Fig. 1 Historical chart of Berkshire Hathaway(BRK.A) stock over the last 3 years. High- 
lighted points (A-F) are the days when its stock price jumped due to an increased news 
volume on social networks and Twitter regarding Anne Hathaway. Courtesy Google Fi- 
nance. 

E: Nov. 24, 2010 - Love and Other Drugs opens: BRK.A up 1.62% 

F: Nov. 29, 2010 - Anne announced as co-host of the Oscars: BRK.A up 0.25% 

G: Feb. 28. 2011 - Anne hosts Oscars with James Franco: BRK.A up 2.94% 

As seen in this example, large volume of tweets can create short term influ- 
ential effects on stock prices. Events such as these motivate us to investigate 
deeper relationship between the dynamics of social media messages and mar- 
ket movements 18j. This work is not directed to find a new stock prediction 
technique which will counter in the effects of various other macroeconomic 
factors. 

The aim of this work, is to quantitatively evaluate the effects of twitter sen- 
timent dynamics around a stocks indices/stock prices and use it in conjunction 
with the standard model to improve the accuracy of prediction. Further in sec- 
tion [6] we investigate into how tweets can be very useful in identifying trends 
in futures and options markets and to build hedging strategies to protect one's 
investment position in the shorter term. 



3 RELATED WORK 

There have been several works related to web mining of data (blogposts, dis- 
cussion boards and news) [T3], 0], and to validate the significance of 
assessing behavioral changes in the public mood to track movements in stock 
markets. Some trivial work shows information from investor communities is 
causative of speculation regarding private and forthcoming information and 
commentaries [T^j . [3I],[TU] and [9]. Dewally in 2003 worked upon naive mo- 
mentum strategy confirming recommended stocks through user ratings had 
significant prior performance in returns . But now with the pragmatic shift 
in the online habits of communities around the worlds, platforms like Stock- 
Twits[]][32] and TweetTradeiQhave come up and their usage is virally spread- 
ing out. Das and Chen made the initial attempts by using natural language 



1 http://stocktwits.com/ 

2 http://twcettrader.net/ 
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Fig. 2 Flowchart of the proposed methodology showing the various phases of sentimental 
analysis beginning with Tweet collection to stock future prediction. In the final phase 4 
set of results have been presented: (1) Correlation results for twitter sentiments and stock 
prices for different companies (2) Granger's casuality analysis to prove that the stock prices 
are affected in the short term by Twitter sentiments (3) Using EMMS for quantitative 
comparison in stock market prediction using tweet features (4) Performance of Twitter 
sentiment forecasting method over different time windows 



processing algorithms classifying stock messages based on human trained sam- 
ples. However their result did not carried statistically significant predictive 
relationships [TP] , 

Gilbert et.al. and Zhang et.al. have used corpus from livejournal blogposts 
in assessing the bloggers sentiment in dimensions of fear , anxiety and worry 
making use of Monte Carlo simulation to reflect market movements in S&P 
500 index [T5II55] . Similar and significantly accurate work is done by Bollen et. 
al who used dimensions of Google- Profile of Mood States to reflect changes 
in closing price of DJIA [5]. Sprengers et.al. analyzed individual stocks for 
S&P 100 companies and tried correlating tweet features about discussions of 
the stock discussions about the particular companies containing the Ticker 
symbol [28) . However these approaches have been restricted to community 
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sentiment at macro-economic level which doesn't give explanatory dynamic 
system for individual stock index for companies. Thus deriving a model that 
is scalable for individual stocks/ companies and can be exploited to make 
successful hedging strategies as discussed in section [6] 



4 WEB MINING AND DATA PROCESSING 

In this section we describe our method of Twitter and financial data collection 
as shown in Figure [2j In the first phase, we mine the tweet data and after 
removal of spam/noisy tweets, they are subsequently subjected to sentiment 
assessment tools in phase two. In later phases feature extraction, aggregation 
and analysis is done. 



4.1 Tweets Collection and Processing 

Out of other investor forums and discussion boards, Twitter has widest accep- 
tance in the financial community and all the messages are accessible through 
a simple search of requisite terms through an application programming inter- 
face (APIJ^j Sub forums of Twitter like StockTwits and TweetTrader have 
emerged recently as hottest place for investor discussion buy/sell out at volu- 
minous rate. Efficient mining of sentiment aggregated around these tweet feeds 
provides us an opportunity to trace out relationships happening around these 
market sentiment terminologies. Currently more than 250 million messages are 
posted on Twitter everyday (Techcrunch October 201 lQ. 

This study was conducted over a period of 14 months period between 
June 2nd 2010 to 29th July 2011. During this period, we collected 4,025,595 
(by around 1.08M users) English language tweets Each tweet record contains 
(a) tweet identifier, (b) date/time of submission(in GMT), (c) language and 
(d)text. Subsequently the stop words and punctuation are removed and the 
tweets are grouped for each day (which is the highest time precision window in 
this study since we do not group tweets further based on hours/minutes). We 
have directed our focus DJIA, NASDAQ-100 and 11 major companies listed 
in Table [T] These companies are some of the highly traded and discussed 
technology stocks having very high tweet volumes. 

As seen in Figure [3j the average message volume for the 11 companies used 
to validate the working model; is more than the average discussion volume 
of DJIA and NASDAQ-100. In this study we have observed that technology 
stocks generally have a higher tweet volume than non-technology stocks. One 

3 Twitter API is easily accessible through an easy documentation available at- 
https://dcv.twitter.com/docs. Also Gnip - http://gnip.com/twitter the premium platform 
available for purchasing public firehose of tweets has many investors as financial customers 
researching in the area. 

4 http: / / techcrunch.com/201 1 /10/17/ twitter-is-at-250-million-tweets-per-day/ 
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Table 1 List of Companies 



Company Name 



Ticker Symbol 



Apple 
AT&T 
Dell 



Amazon 



AMZN 
AAPL 
T 



Microsoft 
Oracle 

Samsung Electronics 

SAP 

Yahoo 



EBay 
Google 



DELL 

EBAY 

GOOG 

MSFT 

ORCL 

SSNLF 

SAP 

YHOO 



reason for this may be that all technology companies come out with new prod- 
ucts and announcements much more frequently than companies in other sec- 
tors(say infrastructure, energy, FMCG, etc.) thereby generating greater buzz 
on social media networks. However, our model may be applied to any com- 
pany/indices that generate high tweet volume. 



Fig. 3 Graph for average of log of daily volume over the months under study 



4.2 Sentiment Classification 

In order to compute sentiment for any tweet we had to classify each incoming 
tweet everyday into positive or negative using nave classifier. For each day total 
number of positive tweets is aggregated as Positiveday while total number 
of negative tweets as Negatively. We have made use of JSON API from 
Twittersentiment Q a service provided by Stanford NLP research group [16] . 
Online classifier has made use of Naive Bayesian classification method, which is 
one of the successful and highly researched algorithms for classification giving 

5 https:/ /sites. google.com/site/twittersentimenthelp/ 
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superior performance to other methods in context of tweets. Their classification 
training was done over a dataset of 1,600,000 tweets and achieved an accuracy 
of about 82.7%. These methods have high replicability and few arbitrary fine 
tuning elements. 

In our dataset roughly 61.68% of the tweets are positive, while 38.32% of 
the tweets are negative for the company stocks under study. The ratio of 3:2 
indicates stock discussions to be much more balanced in terms of bullishness 
than internet board messages where the ratio of positive to negative ranges 
from 7:1 [H] to 5:1 [13]. Balanced distribution of stock discussion provides us 
with more confidence to study information content of the positive and negative 
dimensions of discussion about the stock prices on microblogs. 



4.3 Tweet Feature Extraction 

One of the research questions this study explores is how investment deci- 
sions for technological stocks are affected by entropy of information spread 
about companies under study in the virtual space. Tweet messages are micro- 
economic factors that affect stock prices which is quite different type of rela- 
tionship than factors like news aggregates from traditional media, chatboard 
room etc. which are covered in earlier studies over a particular period [llj . 
[TH] and [13] ■ Keeping this in mind we have only aggregated the tweet parame- 
ters (extracted from tweet features) over a day. In order to calculate parameters 
weekly, bi-weekly, tri-weekly, monthly, 5 weekly and 6 weekly we have simply 
taken average of daily twitter feeds over the requisite period of time. 




Fig. 4 Tweet Sentiment and Market Features 



Twitter literature in perspective of stock investment is summarized in Fig- 
ure^ We have carried forward work of Antweiler et.al. for defining bullishness 
(B t ) for each day (or time window) given equation as: 



(i i i\ r Positive \ 
1 + M ' (1) 
1 + M t NegaUve J 

Where M t Postttve and M t NegaUve represent number of positive or negative 
tweets on a particular day t. Logarithm of bullishness measures the share 
of surplus positive signals and also gives more weight to larger number of 
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messages in a specific sentiment (positive or negative). Message volume for a 
time interval t is simply defined as natural logarithm of total number of tweets 
for a specific stock/index which is \n{M t PosUme + M t NegaUve ). The agreement 
among positive and negative tweet messages is given by: 



/ li/fPositive A/ rNegative 

A t = l-Jl-^i M * P (2) 

V ]\,f Positive _|_ ^Negative <■ ' 

If all tweet messages about a particular company are bullish or bearish, 
agreement would be 1 in that case. Influence of silent tweets days in our study 
(trading days when no tweeting happens about particular company) is less 
than 0.1% which is significantly less than previous research [T3 l f28 ] . Carried 
terminologies for all the tweet features{Positive, Negative, Bullishness, Mes- 
sage Volume, Agreement} remain same for each day with the lag of one day. 
For example, carried bullishness for day d is given by CarriedBullishnessd-i- 



4.4 Financial Data Collection 

We have downloaded financial stock prices at daily intervals from Yahoo Fi- 
nance AP^Jfor DJIA, NASDAQ-100 and the companies under study given in 
Table [I] The financial features (parameters) under study are opening (O t ) and 
closing (Ct) value of the stock/index, highest (H t ) and lowest (L t ) value of the 
stock/index and returns. Returns are calculated as the difference of logarithm 
to the base e between the closing values of the stock price of a particular day 
and the previous day. 

Rt = {In Close (t) - hxClose {t -i)} x 100 (3) 

Trading volume is the logarithm of number of traded shares. We estimate 
daily volatility based on intra-day highs and lows using Garman and Klass 
volatility measures [TJ given by the formula: 



5 STATISTICAL ANALYSIS AND RESULTS 

We begin our study by identifying the correlation between the Twitter feed 
features and stock/index parameters which give the encouraging values of 
statistically significant relationships with respect to individual stocks (indices). 
To validate the causative effect of tweet feeds on stock movements we have used 
econometric technique of Granger's Casuality Analysis. Furthermore, we make 
use of expert model mining system (EMMS) to propose an efficient prediction 



http: / / finance.yahoo.com / 
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model for closing price of DJIA and NASDAQ 100. Since this model does not 
allow us to draw conclusion about the accuracy of prediction (which will differ 
across size of the time window) subsequently discussed later in this section. 

5.1 Correlation Matrix 

For the stock indices DJIA and NASDAQ and 11 tech companies under study 
we have come up with the correlation matrix given in Figure[lO]in the appendix 
between the financial market and Twitter sentiment features explained in sec- 
tion |4j Financial features for each stock/index (Open, Close, Return, Trade 
Volume and Volatility) is correlated with Twitter features (Positive, Negative, 
Bullishness, Carried Positive, Carried Negative and Carried Bullishness). The 
time period under study is monthly average as it the most accurate time win- 
dow that gives significant values as compared to other time windows which is 
discussed later section IBTil 

Our approach shows strong correlation values between various features 
(upto —0.96 for opening price of Oracle and 0.88 for returns from DJIA in- 
dex etc.) and the average value of correlation between various features is 
around 0.5. Comparatively highest correlation values from earlier work has 
been around 0.41 [3H]. As the relationships between the stock(index) pa- 
rameters and Twitter features show different behavior in magnitude and sign 
for different stocks (indices), a uniform standardized model would not appli- 
cable to all the stocks (indices). Therefore, building an individual model for 
each stock(index) is the correct approach for finding appreciable insight into 
the prediction techniques. Trading volume is mostly governed by agreement 
values of tweet feeds as —0.7 for same day agreement and —0.65 for DJIA. 
Returns are mostly correlated to same day bullishness by 0.61 and by lesser 
magnitude 0.6 for the carried bullishness for DJIA. Volatility is again depen- 
dent on most of the Twitter features, as high as 0.77 for same day message 
volume for NASDAQ-100. 

One of the anomalies that we have observed is that EBay gives negative 
correlation between the all the features due to heavy product based marketing 
on Twitter which turns out as not a correct indicator of average growth returns 
of the company itself. 

5.2 Bivariate Granger Causality Analysis 

The results in previous section show strong correlation between financial mar- 
ket parameters and Twitter sentiments. However, the results also raise a point 
of discussion: Whether market movements affects Twitter sentiments or Twit- 
ter features causes changes in the markets? To verify this hypothesis we make 
use of Granger Causality Analysis (GCA) to the time series averaged to weekly 
time window to returns through DJIA and NASDAQ-100 with the Twitter fea- 
tures (positive, negative, bullishness, message volume and agreement). GCA 
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Table 2 Granger's Casuality Analysis of DJIA and NASDAQ for 7 week lag Twitter sen- 
timents. (NSDQ is short for NASDAQ) 



Index 


Lag 


Positiv 


Negative 


Bull 




Msg 


Carr 


Carr 


Carr 


Carr 


Carr 


4 










Agrmn 


Vol 


Posi- 


Nega- 


Bull 


Agrmnt 


Msg 
















tive 


tive 






Vol 




1 


0.614 


0.122 


0.891 


0.316 


0.765 


0.69 


0.103 


0.785 


0.759 


0.934 




2 


0.033** 


0.307 


0.037** 


0.094* 


0.086** 


0.032** 


0.301** 


0.047** 


0.265 


0.045** 




3 


0.219 


0.909 


0.718 


0.508 


0.237 


0.016** 


0.845 


0.635 


0.357 


0.219 


DJIA 


4 


0.353 


0.551 


0.657 


0.743 


0.743 


0.116 


0.221 


0.357 


0.999 


0.272 




5 


0.732 


0.066 


0.651 


0.553 


0.562 


0.334 


0.045** 


0.394 


0.987 


0.607 




6 


0.825 


0.705 


0.928 


0.554 


0.732 


0.961 


0.432 


0.764 


0.261 


0.832 




7 


0.759 


0.581 


0.809 


0.687 


0.807 


0.867 


0.631 


0.987 


0.865 


0.969 




1 


0.106 


0.12 


0.044** 


0.827 


0.064* 


0.02** 


0.04** 


0.043** 


0.704 


0.071* 




2 


0.048** 


0.219 


0.893 


0.642 


0.022** 


0.001** 


0.108 


0.828 


0.255 


0.001** 




3 


0.06* 


0.685 


0.367 


0.357 


0.135 


0.01** 


0.123 


0.401 


0.008** 


0.131 


NSDQ 4 


0.104 


0.545 


0.572 


0.764 


0.092* 


0.194 


0.778 


0.649 


0.464 


0.343 




5 


0.413 


0.997 


0.645 


0.861 


0.18 


0.157 


0.762 


0.485 


0.945 


0.028 




6 


0.587 


0.321 


0.421 


0.954 


0.613 


0.795 


0.512 


0.898 


0.834 


0.591 




7 


0.119 


0.645 


0.089 


0.551 


0.096 


0.382 


0.788 


0.196 


0.648 


0.544 



is not used to establish causality, but as an economist tool to investigate a 
statistical pattern of lagged correlation. A similar observation that cloud pre- 
cede rain is widely accepted; proving cloud may may contain something that 
causes rain but itself may not be actual causative of the real event. 

GCA rests on the assumption that if a variable X causes Y then changes 
in X will be systematically occur before the changes in Y. We realize lagged 
values of X shall bear significant correlation with Y. However correlation is not 
necessarily behind causation. We have made use of GCA in similar fashion as 
[5UT5] This is to test if one time series is significant in predicting another time 
series. Let returns Rt be reflective of fast movements in the stock market. To 
verify the change in returns with the change in Twitter features we compare 
the variance given by following linear models in equation 5 and equation 6 - 

n 

Rt = a + Y^ PiDt-i + e t (5) 

n i=1 n 

i=l i=l 

Equation 5 uses only V lagged values of R t , i.e. (Rt-i, Rt-n ) for 
prediction, while Equation 6 uses the n lagged values of both R t and the tweet 
features time series given by Y t _i, . . . , X t - n . We have taken weekly time 
window to validate the casuality performance, hence the lag values Qvill be 
calculated over the weekly intervals 1, 2, 7. 

From the Table [2j we can reject the null hypothesis (H Q ) that the Twitter 

features do not affect returns in the financial markets i.e. j3\^ ,« 7^ with a 

high level of confidence (high p- values). However as we see the result applies 
to only specific negative and positive tweets (** for p-value < 0.05 and * for 

7 lag at k for any parameter M at xt week is the value of the parameter prior to x t _^ 
week. For example, value of returns for the month of April, at the lag of one month will be 
return apr ii — i which will be retum marc h 
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p- value < 0.1 which is 95% and 99% confidence interval respectively). Other 
features like agreement and message volume do not have significant casual 
relationship with the returns of a stock index (low p- values). 



5.3 EMMS Model for Forecasting 



We have used Expert Model Mining System (EMMS) which incorporates a 
set of competing methods such as Exponential Smoothing (ES), Auto Re- 
gressive Integrated Moving Average (ARIMA) and seasonal ARIMA models. 
These methods are widely used in financial modeling to predict the values of 
stocks/bonds/commodities/etc (23"ll6]. These methods are suitable for constant 
level, additive trend or multiplicative trend and with either no seasonality, ad- 
ditive seasonality, or multiplicative seasonality. 

In this work, selection criterion for the EMMS is coefficient of determi- 
nation (R squared) which is square of the value of pearson-'r' of fit values 
(from the EMMS model) and actual observed values. Mean absolute percent- 
age error (MAPE) and maximum absolute percentage error (MaxAPE) are 
mean and maximum values of error (difference between fit value and observed 
value in percentage) . To show the performance of tweet features in prediction 
model, we have applied the EMMS twice - first with tweets features as in- 
dependent predictor events and second time without them. This provides us 
with a quantitative comparison of improvement in the prediction using tweet 
features. 

ARIMA (p,d,q) are in theory and practice, the most general class of mod- 
els for forecasting a time series data, which is subsequently stationarized by 
series of transformation such as differencing or logging of the series Yi. For 
a non-seasonal ARIMA (p,d,q) model- p is autoregressive term, d is number 
of non-seasonal differences and q is the number of lagged forecast errors in 
the predictive equation. A stationary time series AY differences d times has 
stochastic component 

AYi ~ Normal(ni,a 2 ) (7) 

Where \Xi and a 2 are the mean and variance of normal distribution, respec- 
tively. The systematic component is modeled as: 



Mi = Oi^Yi-i + + a p AYi- p + O^i-i 

4- 4- Or ( > 

Where, AY the lag-p observations from the stationary time series with 
associated parameter vector a and Ci the lagged errors of order q, with asso- 
ciated parameter vector. The expected value is the mean of simulations from 
the stochastic component, 



E(Yi) =m = diAYi-i + + ctpAYi-p + OiEi-x 



(9) 
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Table 3 EMMS model fit characteristics for DJIA and NASDAQ-100 



Index 


Predictors 


Model Fit statistics 


Ljung-Box Q(18) 


Ft-squared MaxAPE Direction 


Statistics DF Sig. 




Dow-30 


Yes 
No 


0.95 1.76 90.8 
0.92 2.37 60 


11.36 18 0.88 
9.9 18 0.94 


NASDAQ-100 


Yes 
No 


0.68 2.69 82.8 
0.65 2.94 55.8 


23.33 18 0.18 
16.93 17 0.46 



Seasonal ARIMA model is of form ARIMA (p ,d ,q) (P,D,Q) where P 
specifies the seasonal autoregressive order, D is the seasonal differencing order 
and Q is the moving average order. Another advantage of EMMS model is 
that it automatically selects the most significant predictors among all others 
that are available. 

In the dataset we have time series for a total of approximately 60 weeks (422 
days), out of which we use approximately 75% i.e. 45 weeks for the training 
both the models with and without the predictors for the time period June 2nd 

2010 to April 14th 2011. Further we verify the model performance as one step 
ahead forecast over the testing period of 15 weeks from April 15th to 29th July 

2011 which count for wide and robust range of market conditions. Forecasting 
accuracy in the testing period is compared for both the models in each case 
in terms of maximum absulute percentage error (MaxAPE), mean absolute 
percentage error (MAPE) and the direction accuracy. MAPE is given by the 
equation |10[ where j/j is the predicted value and yi is the actual value. 

jjn . I Vi-Vi I 

MAPE = — - — ^ — x 100 (10) 

n 

While direction accuracy is measure of how accurately market or commod- 
ity up/ down movement is predicted by the model, which is technically defined 
as logical values for (y i t - +1 - y l: t) x (j/i,t+i - Vi,t) > respectively. 

As we can see in the Table |3j there is significant reduction in MaxAPE 
for DJIA(2.37 to 1.76) and NASDAQ-100 (2.96 to 2.69) when EMMS model 
is used with predictors as events which in our case our all the Tweet features 
(positive, negative, bullishness, message volume and agreement). Using tweet 
features as part of the prediction process in the EMMS model, gives more 
robust approach than the traditional forecasting methods. There is significant 
decrease in the value of MAPE for DJIA which is 0.8 in our case than 1.79 
for earlier approaches [5]. As we can from the values of R-square, MAPE and 
MaxAPE in Table [3] for both DJIA and NASDAQ 100, our proposed model 
uses Twitter sentiment analysis for a superior performance over traditional 
methods. Since EMMS is a customizable and scalable technique, our proposed 
model is bound to perform well in a wide range of stocks and indices. 

Figures [5] shows the EMMS model fit for weekly closing values for DJIA 
and NASDAQ 100. In the figure fit are model fit values, observed are values 
of actual index and UCL & LCL are upper and lower confidence limits of the 
prediction model. 
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5.4 Prediction Accuracy using OLS Regression 

Our results in the previous section showed that forecasting performance of 
stocks/indices using Twitter sentiments varies for different time windows. 
Hence it is important to quantitatively deduce a suitable time window that 
will give us most accurate prediction. Figure [6] shows the plot of R-square 
metric for OLS regression for returns from stock indexes NASDAQ-100 and 
D JIA from tweet board features (like number of positive, negative, bullishness, 
agreement and message volume) both for carried (at 1-day lag) and same week. 



The R-square metric (explained in section 5.3) is calculated as prediction 
performance indicator for different time windows from daily, weekly, bi-weekly 
to 6 weekly time window. From the figure [6]it can be inferred as we increase the 
time window the accuracy in prediction increases but only till a certain point 
that is monthly in our case beyond which value of R-square starts decreasing 
again. Thus, for monthly predictions we have highest accuracy in predicting 
anomalies in the returns from the tweet board features. 

In the next section we will discuss the practical implementation of how 
short term hedging strategies can improve efficiency by modeling mass public 
opinion and behavior for a particular company or stock index through mining 
of tweet sentiments. 
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Time Window 

Fig. 6 Plot of R-square values over different time windows for DJIA and NASDAQ-100. 
Higher values denote greater prediction accuracy. 

6 HEDGING STRATEGY USING TWITTER SENTIMENT 
ANALYSIS 

Portfolio protection is very important practice that is weighted as much as 
portfolio appreciation. Just like a normal user purchases insurance for its 
house, car or any commodity, one can also buy insurance for the investment 
that is made in the stock securities. This doesn't prevent a negative event from 
happening, but if it does happen and you're properly hedged, the impact of the 
event is reduced. In a diverse portfolio hedging against investment risk means 
strategically using instruments in the market to offset the risk of any adverse 
price movements. Technically, to hedge investor invests in two securities with 
negative correlations, which again in itself is time varying dynamic statistics. 

To explain how weekly forecast based on mass tweet sentiment features 
can be potentially useful for a singular investor, we will take help of a simple 
example. 

Let us assume that the share for a company CI is available for $X per 
share and the cost of premium for a stock option of company CI (with strike 
price $X) is $Y. 

A = total amount invested in shares of a company CI which is number of 
shares (let it be N) x $X 

B= total amount invested in put option of company CI (relevant blocksize 
x $Y) 

And always for an effective investment (N x $X) > ( Blocksize x $Y) 
An investor shall choose the value of N as per as their risk appetitive i.e. 
ratio of A:B = 2:1 (assumed in our example, will vary from from investor 
to investor). Which means in the rising market conditions, he would like to 
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keep 50% of his investment to be completely guarded, while rest 50% are risky 
components; whereas in the bearish market condition he would like to keep 
his complete investment fully hedged by buying put options equivalent of all 
the investment he has made in shares for the same security. From Figure [7J we 
infer for the P /L curves consisting of shares and 2 different put options for the 
company CI purchased as different time intervals hence the different pre- 
mium price even with the same strike price of $X. Using married put strategy 
makes the investment risk free but reduces the rate of return in contrast to the 
case which comprises of only equity security which is completely free-fall to 
the market risk. Hence the success of married put strategy depends greatly on 
the accuracy of predicting whether the markets will rise of fall. Our proposed 
Tweet sentiment analysis can be highly effective in this prediction to deter- 
mine accurate instances when the investor should readjust his portfolio before 
the actual changes happen in the market. Our proposed approach provides 
an innovative technique of using dynamic Twitter sentiment analysis to exploit 
the collective wisdom of the crowd for minimising the risk in a hedged port- 
folio. Below we summarize two different portfolio states at different market 
conditions. 

Table 4 Example investment breakdown in the two cases 

Partially Hedged Portfolio at 50% risk 

1000 shares at price of $X = 1000X 

1 Block size of 500 shares put options purchased at strike price of $X with premium of 
$Y each = 500Y 

Total= 1000X + 500Y 

Fully Hedged Portfolio at minimized risk 

1000 shares at price of $X = 1000X 

2 Block size of 500 shares each put options purchased at strike price of $X with premium 
of $Y each = 2x500Y = 1000Y 

Total = 1000X + 1000Y 



To check the effectiveness of our proposed tweet based hedging strategy, we 
run simulations and make portfolio adjustments in various market conditions 
(bullish, bearish, volatile etc). To elaborate, we take an example of DJIA 
ETF's as the underlying security over the time period of 14th November 2010 
to 30th June 2011. Approximately 76% of the time period is taken in the 
training phase to tune the SVM classifier (using tweet sentiment features from 
the prior week). This trained SVM classifier is then used to predict market 
direction (DJIA's index movement) in the coming week. Testing phase for the 
classification model (class 1- bullish market f and class 0- bearish market I) is 
from 8th May to 30th June 2011 consisting a total of 11 weeks. SVM model is 
build using KSVM classification technique with the linear (vanilladot) kernel 

8 The reason behind purchase of long put options at different time intervals is because 
in a fully hedged portfolio, profit arrow has lower slope as compared to partially hedged 
portfolio (refer P/L graph). Thus the trade off between risk and security has to be carefully 
played keeping in mind the precise market conditions. 



Twitter Sentiment Analysis: How To Hedge Your Bets In The Stock Markets 



17 




Fig. 7 Portfolio adjustment in cases of bearish (fully hedged) and bullish (partial hedged) 
market scenarios. In both the figures, strike price is the price at which a option is purchased, 
Break even point (BEP) is the instance when investment starts making profit. In case of 
bearish market scenario, two options at same strike price (but different premiums) are in 
purchased at different instances, Optionl brought at the time of initial investment and 
Option2 brought at a later stage (hence lower in premium value). 



using the package 'el071' in R statistical language. Over the training dataset, 
the tuned value of the objective function is obtained as —4.24 and the number 
of support vectors is 8. Confusion matrix for the predicted over the actual 
values (in percentage) is given in Table [5j Overall classifier accuracy over 
the testing phase is 90.91%. Receiver operator characteristics (ROC) curve 
measuring the accuracy of the classifier as true positive rate to false positive 
rate is given in the figure [8j It shows the tradeoff between sensitivity i.e. true 
positive rate and specificity i.e. true negative rate (any increase in sensitivity 
will be accompanied by a decrease in specificity). Good statistical significance 
for the classification accuracy can be inferred from the value of area under the 
ROC curve (AUC) which comes out to 0.88. 



Table 5 Prediction accuracy over the Testing phase (11 weeks). Values in percentage. 



Confusion Matrix 


Predicted Direction 


Market Down Market Up 


Actual Direction 


Market Down 
Market Up 


45 9 
45 



Figure [9] shows the D JIA index during the testing period and the arrows 
mark the weeks when the adjustment is done in the portfolio based on predic- 
tion obtained from tweet sentiment analysis of prior week. At the end of the 
week (on Sunday) , using tweet sentiment feature we predict what shall be the 
market condition in the coming week- whether the prices will go down or up. 
Based on the prediction portfolio adjustment - bearish — > bullish or bullish 
— ► bearish. 
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Fig. 8 Receiver operating characteristic (ROC curve) curve for the KSVM classifier pre- 
diction over the testing phase. ROC is graphical plot of the sensitivity or true positive rate, 
vs. false positive rate (one minus the specificity or true negative rate). More the area under 
curve for typical ROC, more is the performance efficiency of the machine learning algorithm. 
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Fig. 9 DJIA index during the testing period. In the figure green marker shows adjustment 
bearish — > bullish, while red arrow shows adjustment bullish — > bearish. (Data courtesy 
Yahoo! finance) 



7 DISCUSSIONS 



In section [5j we observed how the statistical behavior of market through Twit- 
ter sentiment analysis provides dynamic window to the investor behavior. 
Furthermore, in the section [6] we discussed how behavioral finance can be 
exploited in portfolio decisions to make highly reduced risked investment. Our 
work answers the important question - If someone is talking bad/good about 
a company (say Apple etc.) as singular sentiment irrespective of the overall 
market movement, is it going to adversely affect the stock price? Among the 
5 observed Twitter message features both at same day and lagged intervals 
we realize only some are Granger causative of the returns from DJIA and 
NASDAQ-100 indexes, while changes in the public sentiment is well reflected 
in the return series occurring at even lags of 1, 2 and 3 weeks. Remarkably the 
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most significant result is obtained for returns at lag 2 (which can be inferred 
as possible direction for the stock/index movements in the next week). 

Table [6] given below explains the different approaches to the problem that 
have been done in past by researchers [28], [5] and [15]. As can be seen from 
the table, our approach is scalable, customizable and verified over a large 
data set and time period as compared to other approaches. Our results are 
significantly better than the previous work. Furthermore, this model can be of 
effective use in formulating short-term hedging strategies (using our proposed 
Twitter based prediction model). 

Table 6 Comparison of Various Approaches for Modeling Markets Movements Through 
Twitter 



Previous Approaches 


Bollen et al. \E\ and 
Gilbert et al. [15] 


Sprcnger et al. [28| 


This Work 


Approach 


Mood of complete Twit- 
ter feed 


Stock Discussion with 
ticker $ on Twitter 


Discussion based track- 
ing of Twitter senti- 
ments 


Dataset 


28th Feb 2008 to 19th 
Dec 2008, 9M tweets 
sampled as 1.5% of 
Twitter feed 


1st Jan 2010 to 30th 
June 2010- 0.24M 
tweets 


2nd June 2010 to 29th 
July 2011- 4M tweets 
through search API 


Techniques 


SOFNN, Grangers and 
linear models 


OLS Regression and 
Correlation 


Corr, GCA, Expert 
Model Mining System 
(EMMS) 


Results 


* 86.7% directional ac- 
curacy for DJIA 


* Max corr value of 0.41 
for returns of S&P 100 
stocks 


* High corr values 
(upto -0.96) for open- 
ing price 

* Strong corr values 
(upto 0.88) for re- 
turns 

* MaxAPE of 1.76% 
for DJIA 

* Directional accu- 
racy of 90.8% for 
DJIA 


Feedback/ Draw- 
backs 


Individual modeling for 
stocks not feasible 


News not taken into ac- 
count , very less tweet 
volumes 


Comprehensive and 
customizable approach. 
Can be used for hedging 
in F&O markets 



8 CONCLUSION 

In this paper, we have worked upon identifying relationships between Twit- 
ter based sentiment analysis of a particular company/index and its short-term 
market performance using large scale collection of tweet data. Our results show 
that negative and positive dimensions of public mood carry improved power to 
track movements of individual stocks/indices. We have also investigated vari- 
ous other features like how previous week sentiment features control the next 
week's opening, closing value of stock indexes for various tech companies and 
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major index like DJIA and NASDAQ- 100. As compared to earlier approaches 
in the area which have been limited to wholesome public mood and stock ticker 
constricted discussions, we verify strong performance of our alternate model 
that captures mass public sentiment towards a particular index or company 
in scalable fashion and hence empower a singular investor to ideate coherent 
relative comparisons. Our analysis of individual company stocks gave strong 
correlation values (upto 0.88 for returns) with twitter sentiment features of 
that company. Further we also discuss how Twitter sentiments bring wisdom 
of the crowd to use by even a singular investor in the form of simplistic mar- 
ried put hedging strategy. Using this technique trader can retain his portfolio 
with minimum risk even during highly bullish/bearish market conditions. It is 
no surprise that this approach is far more robust and gives far better results 
(upto 91% directional accuracy) than any previous work. In the near future, 
Twitter sentiments analysis promises to be an effective strategy for hedging 
the investments in the financial markets. 
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9 APPENDIX 

Correlation heatmap indicative of significant relationships between various 
twitter features with the index features. 



Correlation plot 
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Fig. 10 Heatmap showing pearson correlation coefficients between security indices vs fea- 
tures from Twitter and stock features. 



